首页> 外文OA文献 >Risk-sensitive Inverse Reinforcement Learning via Semi- and Non-Parametric Methods
【2h】

Risk-sensitive Inverse Reinforcement Learning via Semi- and Non-Parametric Methods

机译:半经济和风险敏感的逆向强化学习   非参数方法

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

The literature on Inverse Reinforcement Learning (IRL) typically assumes thathumans take actions in order to minimize the expected value of a cost function,i.e., that humans are risk neutral. Yet, in practice, humans are often far frombeing risk neutral. To fill this gap, the objective of this paper is to devisea framework for risk-sensitive IRL in order to explicitly account for a human'srisk sensitivity. To this end, we propose a flexible class of models based oncoherent risk measures, which allow us to capture an entire spectrum of riskpreferences from risk-neutral to worst-case. We propose efficientnon-parametric algorithms based on linear programming and semi-parametricalgorithms based on maximum likelihood for inferring a human's underlying riskmeasure and cost function for a rich class of static and dynamicdecision-making settings. The resulting approach is demonstrated on a simulateddriving game with ten human participants. Our method is able to infer and mimica wide range of qualitatively different driving styles from highly risk-averseto risk-neutral in a data-efficient manner. Moreover, comparisons of theRisk-Sensitive (RS) IRL approach with a risk-neutral model show that the RS-IRLframework more accurately captures observed participant behavior bothqualitatively and quantitatively, especially in scenarios where catastrophicoutcomes such as collisions can occur.
机译:关于逆向强化学习(IRL)的文献通常假设人类采取了行动以使成本函数的期望值最小化,即人类是风险中立的。然而,在实践中,人类往往远没有风险中立。为了填补这一空白,本文的目的是为风险敏感的IRL设计一个框架,以明确说明人类的风险敏感度。为此,我们提出了基于相干风险度量的灵活的模型类别,该模型使我们能够捕获从中性风险到最坏情况的整个风险偏好范围。我们提出了基于线性规划和基于最大可能性的半参数算法的高效非参数算法,可针对丰富的静态和动态决策设置推断人类潜在的风险度量和成本函数。在十个人参与的模拟驾驶游戏中演示了这种方法。我们的方法能够以数据有效的方式推断和模仿从高风险规避到风险中立的各种定性不同的驾驶方式。此外,将风险敏感(RS)IRL方法与风险中性模型进行比较表明,RS-IRL框架可以更准确地从质和量上捕获观察到的参与者行为,尤其是在可能发生灾难性结果(例如碰撞)的情况下。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号